Autotuning of Pattern Runtimes for Accelerated Parallel Systems

نویسندگان

  • Enes Bajrovic
  • Siegfried Benkner
  • Jirí Dokulil
  • Martin Sandrieser
چکیده

Parallel architectures with node-level accelerators promise significant performance improvements over conventional homogeneous systems. To cope with the increased complexity of programming such systems various pattern-based programming libraries have become available. In this paper we present our work on providing autotuning capabilities for two runtime libraries that provide parallel programming patterns on state-of-the-art heterogeneous hardware. We present a brief overview of these runtime libraries, outline possible integration with existing tuning frameworks and present initial experimental results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures

We have developed pOSKI: the Parallel Optimized Sparse Kernel Interface – an autotuning framework to optimize Sparse Matrix Vector Multiply (SpMV) performance on emerging shared memory multicore architectures. Our autotuning methodology extends previous work done in the scientific computing community targeting serial architectures. In addition to previously explored parallel optimizations, we f...

متن کامل

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...

متن کامل

Application-independent Autotuning for GPUs

Autotuning is an established technique for adjusting performance-critical parameters of applications to their specific run-time environment. In this paper, we investigate the potential of online autotuning for general purpose computation on GPUs. Our application-independent autotuner AtuneRT optimizes GPU-specific parameters such as block size and loop-unrolling degree. We also discuss the pecu...

متن کامل

Statement of Research

History has shown the benefits of high-level languages, language design, and managed language runtimes on how programmers develop complex and sophisticated systems. High-level languages, such as Java and Standard ML, are strongly typed and provide rich abstraction mechanisms, thereby reducing the time and effort to develop software. Language primitives and abstractions provide semantic guarante...

متن کامل

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems

MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilitie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013